Parking has been a latent issue for a lot of big cities in USA. While Americans take it for granted that on-street parking should be cheap for most of the time, the cruising behavior for available parking spaces, the illegal parking on the street taking spaces belongs to pedestrians and cyclists and the causing congestion are reminding transportation planners to adjust parking condition on the street in order to create an efficient environment for travelers. Seattle has been a pilot city, although also with many parking issues (See below the illegal parking), in amending its parking plan with regular recorded parking data and annually adjusted parking rate.
In this project, we aim to analyze the Annual Parking Study Data published by Seattle Department of Transportation (SDOT) and develop a prediction model of paid on-street parking occupancy in downtown Seattle. Based on the internal characteristics (such as parking rate, parking spaces), spatial structure (such as neighborhood, land use and side of the blockface), amenities (such as street signs and event) and temporal data (such as time lag, weekday and time period), a linear regression model of parking occupancy rate and its evaluation is developed.
With the forecasting model, the app Parkemon will be established to help transportation planners to adjust related parameters in order to get its influence on parking occupancy. This will be a strong reference for amending parking planning and make easier planning decisions. (See the link to Youtube video for more details about the app wireframe: https://youtu.be/HJhKWXaGZic)
Data used in the project are mainly from the Annual Parking Study of SDOT and City of Seattle Open Data portal. We divide data into four categories. For the internal characteristics of the parking, such as parking spaces, total vehicle count are directly from Annual Parking Study dataset. For the amenities, we used feature engineering and calculate the distance to the nearest amenities, such as distance to public school, distance to hospital, distance to tourist attraction. Some feature engineering happened during the transformation of the data. For the spatial structure, neighborhood, paid parking sub area are included. We also make spatial lags of average parking occupancy in the nearest k blockface neighbors. For the temporal data, peak hours and different type of time lags are calculated to measure the parking occupancy in our hour, two hours, three hours, four hours, twelve hours and one day.
The dependent variable in the analysis representing the parking condition is the percentage of parking occupancy in downtown of Seattle, as data outside of downtown is not sufficient. It is calculated by Total_Vehicle_Count/Parking_Spaces*100. To make sure a linear regression model can be built without errors, we extract data that the value of parking occupancy is larger than 0.
From the following map, the parking occupancy ranges from 0 to 100%. Blockfaces in downtown areas are mainly fully occupied, shown red in the map.
ggplot() +
geom_sf(data = nhoods, fill = "grey40") +
geom_sf(data = parking_join, aes(colour = q5(Total_Vehicle_Count)), size = .75) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(parking_join,"Total_Vehicle_Count"))+
labs(title = "Parking Condition in Seattle") +
mapTheme()
The specific characteristic of parking data is temporal. We reorganized the dataset and are going to take an overview of the parking occupancy during different days in one week and different hours in one day. The following map shows the parking condition in downtown, Seattle across different time period.
Parking occupancy across different days during one week is quite different while across different hours in one day is similar. Due to the dataset limit, very few data of parking occupancy for Saturday and Sunday are recorded according to the map.
Parking2%>%
mutate(time_of_day = case_when(Time == "08:00" | Time == "09:00" | Time == "10:00" ~ "AM Rush",
Time == "11:00" | Time == "12:00" | Time == "13:00"|Time == "14:00"|Time == "15:00" ~ "Mid-day",
Time == "16:00" | Time == "17:00" | Time == "18:00"|Time == "19:00" ~ "PM Rush",
Time == "20:00" | Time == "21:00" | Time == "22:00"|Time == "23:00"|Time == "00:00"|Time == "01:00"|
Time == "02:00"|Time == "03:00"|Time == "04:00"|Time == "05:00"|Time == "06:00"|Time == "07:00"~ "Overnight"))%>%
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Parking2, aes(colour = q5(Occupancy)), show.legend = "point", size = 1) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Parking2,"Occupancy"),
name="Quintile\nBreaks") +
facet_grid(time_of_day~weekday)+
labs(title="Parking Occupancy Across Time in Seattle") +
mapTheme()
The following map shows the location of garages across the city. They are mainly clustered in the downtown.
ggplot() +
geom_sf(data = nhoods, fill = "grey40") +
geom_sf(data = garage, colour = "#FA7800", size = .75) +
labs(title = "Public parking lots and garages in Seattle") +
mapTheme()
The map shows the distance from each blockface to the nearest garage in feet in downtown. The distance is in the range of 6 feet to 74 feet. The main distance is around 20 feet to 35 feet. Distance with large value appears in the peripheral area of downtown, shown in red.
Final %>%
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Final, aes(fill = q5(distance_garage))) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Final,"distance_garage"),
name="Quintile\nBreaks") +
labs(title="Distance to nearest public parking lot/garage") +
mapTheme()
It is obvious that the distance to the nearest garage has the positive correlation with the parking occupancy. In other words, closer to the nearest garage, lower the parking occupancy.
ggplot(filter(Final, Occupancy >= 0), aes(distance_garage, Occupancy)) +
geom_point() + geom_smooth(method="lm", se=F, colour = "#FA7800") +
labs(title = "Parking occupancy as a function of distance to garage",
x = "Distance to the nearest public parking lot/garage",
y = "Parking occupancy (%)") +
plotTheme()
The second independent variable is parking rate for each blockface. The following histogram shows the frequency of different parking rate from 2014 to 2019 in downtown, Seattle. Parking rate 0 takes the majority, following by the parking rate 1.5 and 3.5.
Final %>%
group_by(Rate) %>%
summarize(count = n()) %>%
ggplot(aes(reorder(Rate, -count), count)) +
geom_bar(stat = "identity", colour = "white", fill="#25CB10") +
labs(title = "Frequency of different parking rates from 2014 to 2019, Seattle",
x="Parking rate per hour($)", y="Count") +
plotTheme() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Parking rate across different time is diverse. We extract data from all years to the year 2018. Although data is not sufficient, it can be seen from the figures that the parking rates in the center of downtown area are the highest. However, parking rates outside of the center of downtown are lower than the center.
Final2018 <- Final %>%
subset(Final$Study.Year == "2018")
Final2018%>%
mutate(time_of_day = case_when(Time == "08:00" | Time == "09:00" | Time == "10:00" ~ "AM Rush",
Time == "11:00" | Time == "12:00" | Time == "13:00"|Time == "14:00"|Time == "15:00" ~ "Mid-day",
Time == "16:00" | Time == "17:00" | Time == "18:00"|Time == "19:00" ~ "PM Rush",
Time == "20:00" | Time == "21:00" | Time == "22:00"|Time == "23:00"|Time == "00:00"|Time == "01:00"|
Time == "02:00"|Time == "03:00"|Time == "04:00"|Time == "05:00"|Time == "06:00"|Time == "07:00"~ "Overnight"))%>%
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Final2018, aes(fill = q5(Rate))) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Final,"Rate"),
name="Quintile\nBreaks") +
facet_grid(time_of_day~weekday)+
labs(title="Parking rate in different time period in 2018") +
mapTheme()
The third important independent variable is land use. Form the histogram plot, land use for downtown, commercial and mixed use, and multi-family are the most.
Final %>%
group_by(class_desc) %>%
summarize(count = n()) %>%
ggplot(aes(reorder(class_desc, -count), count)) +
geom_bar(stat = "identity", colour = "white", fill="#25CB10") +
labs(title = "Frequency of different different landuse types, Seattle",
x="Land use", y="Count") +
plotTheme() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The map of land use shows the distribution of different land use type in the downtown, Seattle. In the core, downtown, commercial and mixed use take the most. Meanwhile, industrial, single and multi-family use are located outside the core of downtown.
ggplot() +
geom_sf(data = landuse)
For the spatial lag features, feature engineering were made to calculate the average parking occupancy in the nearest k blockface neighbors. The following plot shows the correlation between parking occupancy and four spatial lag features. The patterns are similar with the positive correlation. However, the spatial lag of 15 blockface neighbors has the highest correlation with parking occupancy. Therefore, we select this one among four into the linear regression model.
Parking_final %>%
dplyr::select(starts_with("lagOccupancy"), Occupancy) %>%
st_set_geometry(NULL) %>%
gather(Variable, Value, -Occupancy) %>%
ggplot(aes(Value, Occupancy)) +
geom_point() +
geom_smooth(method = "lm", se=F, colour = "#25CB10") +
facet_wrap(~Variable) +
labs(title = "Correlations between Parking Occupancy(%) and the spatial lag feature")
With the calculation of spatial lag of 15 nearest blockfaces, the map shows that average parking occupancy in 15 nearest blockfaces are mainly in the range of 72% to 93%, which are located in the core of downtown.
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Parking2, aes(colour = q5(lagOccupancy15)), size = .75) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Parking2,"lagOccupancy15"))+
labs(title = "Spatial Lag(15) of Parking Occupancy in Seattle") +
mapTheme()
The following figure shows the correlation between diver time lags and parking occupancy. It’s obvious that lag hour 1, lag hour 2, lag hour 3, and lag hour 4 have higher correlation with parking occupancy than lag 12 and lag24. It’s also due to the limit of dataset.
Final %>%
dplyr::select(starts_with("lagHour"), Occupancy) %>%
st_set_geometry(NULL) %>%
gather(Variable, Value, -Occupancy) %>%
ggplot(aes(Value, Occupancy)) +
geom_point() +
geom_smooth(method = "lm", se=F, colour = "#25CB10") +
facet_wrap(~Variable) +
labs(title = "Correlations between Parking Occupancy(%) and the time lag feature")
In this part, different data is collected from available resources such as Annual Parking Study of SDOT and City of Seattle Open Data portal. There are four parts of the variables: internal characteristics, amenities, spatial structure and temporal variables. Internal characteristics represents the parking internal features such as the parking spaces, construction or parking rates. Amenities represents the services in the nearby environment such as schools or parks or garages. Spatial structure represents more of the location of the blockfaces such as the neighborhood as well as spatial lags. Temporal variables represent the feature of time of the parking occupancy, such as weekday, time period and time lags.
After gathering all the data, some necessary process of data transformation makes the data convert to variables in an easier way to achieve the mathematic connection to parking occupancy. For example, the parking occupancy data collected from Annual Parking Study of SDOT is geo polylines on the maps. To relate them with the amenities, the average distance to some nearest school, hospital, garage from the houses is measured. In order to calculate the distance with K-nearest neighbor method, the blockface unit data are transformed to the point data by using the mid-point of blockfaces to represent blockfaces.
Another example is the temporal variables. As daily parking and weekly parking conditions are diverse in the same blockface, all temporal data are divided into four periods during one day: AM rush (8am-10am), midday (11am-15pm), PM rush (16pm-19pm) and overnight (20pm-7am), and five days during one week: Tuesday, Wednesday, Thursday, Saturday, and Sunday.
In this section, the relationship between each two independent variables are tested. If two variables are highly correlated, they may have very similar representations statistically so it is meaningless to include both of them. Some variables that are highly correlated are removed after consideration, such as lagOccupancy 5, lagOccupancy 10, and lagOccupancy 20.
ParkingCor <- Parking_final %>%
st_set_geometry(NULL)%>%
dplyr::select(-Sub_Area,-Side,-Construction,
-Event.Closure,-Subarea.Label,-Peak,-PARKING_CATEGORY,
-Occupancy,-S_HOOD,-class_desc,-weekday,-Period,-ZONE)
M <- cor(ParkingCor)
corrplot(M, method = "number")
To build the linear regression model for the parking occupancy in downtown, Seattle, we select independent variables associating with parking occupancy but not correlated with each other. For categorical data, we change them to be dummy variables.
To get an accurate and generalized model, cross validations divided the observation data into two sets: training (70% of the data) and test (the rest 30%). The idea is to use the current variables to test on the test sets to see if every randomly chosen test set could be explained well by the model. After 30 folds of the test, around 55% of the variance in sale price could be explained by the model.
The following table shows the coefficients for all numerical variables and categorical variables. It is shown that the most important variables include Peak hour, downtown use in the land use, spatial lag 15, and time lags which not only have high coefficients but also are statistically important with p-value smaller than 0.05.
I.Distribution of parking occupancy & parking occupancy predictions
After building the model, the distribution of predicted parking occupancy needs to be compared with the observed one. It is shown from the plot that the predicted one has a pattern of standard normal distribution, which is suitable with the pattern of observed dataset.
#Accuracy Test by Creating Data Partition
inTrain3 <- createDataPartition(
y = paste(Parking_final$Sub_Area, Parking_final$Subarea.Label, Parking_final$S_HOOD,Parking_final$class_desc, Parking_final$Time, Parking_final$Date),
p = .70, list = FALSE)
Parking.training <- Parking_final[inTrain3,]
Parking.test <- Parking_final[-inTrain3,]
#Make a new regression based on the training set
reg_training <- lm(Occupancy ~ ., data = as.data.frame(Parking_final) %>%
dplyr::select (-geometry,-SHAPE_Length))
summary(reg_training)
#test on the testing set
Parking.test <-
Parking.test %>%
mutate(ParkingOccupancy.Predict = predict(reg_training, Parking.test),
ParkingOccupancy.Error = ParkingOccupancy.Predict - Occupancy,
ParkingOccupancy.AbsError = abs(ParkingOccupancy.Predict - Occupancy),
ParkingOccupancy.APE = (abs(ParkingOccupancy.Predict - Occupancy)) / Occupancy)
# Distribution of parking occupancy & parking occupancy predictions
as.data.frame(Parking.test) %>%
dplyr::select(Occupancy, ParkingOccupancy.Predict) %>%
gather(Variable, Value) %>%
ggplot(aes(Value, fill = Variable)) +
geom_density(alpha = 0.5) +
scale_fill_manual(values = c("#25CB10", "#FA7800")) +
labs(title="Distribution of parking occupancy & parking occupancy predictions",
x = "Occupancy/Prediction", y = "Density of observations") +
plotTheme()
II.Predicted parking occupancy as a function of observed occupancy
As shown in the plot below, the black points are the real parking occupancy in the dataset while the orange line represents the perfect prediction where predicted value equals observed value. The green line is the actual prediction the model provides. It’s clear that when the parking occupancy is around 80%, the model predicts with relative less residuals. When the parking occupancy is below 80%, the model tends to over predict the parking occupancy. When the parking occupancy is above 80%, the model tends to underpredict the parking occupancy.
ggplot(Parking.test, aes(ParkingOccupancy.Predict, Occupancy)) +
geom_point() +
stat_smooth(data=Parking.test, aes(ParkingOccupancy.Predict, Occupancy),
method = "lm", se = FALSE, size = 1, colour="#FA7800") +
stat_smooth(data=Parking.test, aes(Occupancy, ParkingOccupancy.Predict),
method = "lm", se = FALSE, size = 1, colour="#25CB10") +
labs(title="Predicted parking occupancy as a function of\nobserved occupancy",
subtitle="Orange line represents a perfect prediction; Green line represents prediction") +
theme(plot.title = element_text(size = 18,colour = "black"))
III.Predicted parking occupancy as a function of observed occupancy across time
To be more specific, we want to see the predicted parking occupancy as a function of observed parking occupancy across the time periods we define as mentioned above. The black line represents the perfect line while the red line is the actual prediction. Overall, the prediction of the parking occupancy on the midday of Tuesday is the most accurate one, which is closest to the perfect line. Also, Saturday and Sunday do not predict well due to the limit of data.
Parking.test%>%
mutate(time_of_day = case_when(Time == "08:00" | Time == "09:00" | Time == "10:00" ~ "AM Rush",
Time == "11:00" | Time == "12:00" | Time == "13:00"|Time == "14:00"|Time == "15:00" ~ "Mid-day",
Time == "16:00" | Time == "17:00" | Time == "18:00"|Time == "19:00" ~ "PM Rush",
Time == "20:00" | Time == "21:00" | Time == "22:00"|Time == "23:00"|Time == "00:00"|Time == "01:00"|
Time == "02:00"|Time == "03:00"|Time == "04:00"|Time == "05:00"|Time == "06:00"|Time == "07:00"~ "Overnight"))%>%
ggplot()+
geom_point(aes(x= Occupancy, y = ParkingOccupancy.Predict))+
geom_smooth(aes(x= Occupancy, y= ParkingOccupancy.Predict), method = "lm", se = FALSE, color = "red")+
geom_abline(slope = 1, intercept = 0)+
facet_grid(time_of_day~weekday)+
labs(title="Observed vs Predicted",
x="Observed Occupancy",
y="Predicted Occupancy")+
plottheme
The map below shows the predicted parking occupancy by the model created in this project. The general spatial trends match the current known parking occupancy trends. The highest parking occupancy which is above 90% appears around the center area of downtown, Seattle. Meanwhile, the blockfaces with lower parking occupancy take the minority and are outside of the center of downtown.
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Parking.test, aes(colour = q5(Occupancy)), show.legend = "point", size = 1) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Parking.test,"Occupancy"),
name="Quintile\nBreaks(%)") +
labs(title="Predicted Parking Occupancy Map") +
mapTheme()
Additionally, to see the predicted results across defined time periods, a series of maps are created. Overall, the pattern is also similar to the current parking occupancy. To be more specific, the parking occupancy on Tuesday of one week is highest and the blockfaces in the core area of downtown are shown in red with the highest parking occupancy.
Parking.test%>%
mutate(time_of_day = case_when(Time == "08:00" | Time == "09:00" | Time == "10:00" ~ "AM Rush",
Time == "11:00" | Time == "12:00" | Time == "13:00"|Time == "14:00"|Time == "15:00" ~ "Mid-day",
Time == "16:00" | Time == "17:00" | Time == "18:00"|Time == "19:00" ~ "PM Rush",
Time == "20:00" | Time == "21:00" | Time == "22:00"|Time == "23:00"|Time == "00:00"|Time == "01:00"|
Time == "02:00"|Time == "03:00"|Time == "04:00"|Time == "05:00"|Time == "06:00"|Time == "07:00"~ "Overnight"))%>%
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Parking.test, aes(colour = q5(ParkingOccupancy.Predict)), show.legend = "point", size = 1) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Parking.test,"ParkingOccupancy.Predict"),
name="Quintile\nBreaks") +
facet_grid(time_of_day~weekday)+
labs(title="Predicted Parking Occupancy Across Time") +
mapTheme()
The overall Mean Absolute Error (MAE) of the predicted parking occupancy is 0.17. Additionally, we want to see the MAE across space and different time periods. The figure below is the map of Mean Absolute Error across space and time. Blockfaces in the core area of downtown which have high parking occupancy currently tend to have highest MAE, with around 20% to 29%. We do not use MAPE here because the parking occupancy includes too many zero value, which leads to an issue of calculating MAPE.
#Look at the Mean Absolute Error (MAE)
Parking.test %>%
summarize(mean(ParkingOccupancy.AbsError, na.rm = T)) %>%
st_set_geometry(NULL) %>%
pull()
#[1] 0.1749332 MAE
Parking.test%>%
mutate(time_of_day = case_when(Time == "08:00" | Time == "09:00" | Time == "10:00" ~ "AM Rush",
Time == "11:00" | Time == "12:00" | Time == "13:00"|Time == "14:00"|Time == "15:00" ~ "Mid-day",
Time == "16:00" | Time == "17:00" | Time == "18:00"|Time == "19:00" ~ "PM Rush",
Time == "20:00" | Time == "21:00" | Time == "22:00"|Time == "23:00"|Time == "00:00"|Time == "01:00"|
Time == "02:00"|Time == "03:00"|Time == "04:00"|Time == "05:00"|Time == "06:00"|Time == "07:00"~ "Overnight"))%>%
ggplot() +
geom_sf(data = downtown, fill = "grey40") +
geom_sf(data = Parking.test, aes(colour = q5(ParkingOccupancy.AbsError)), show.legend = "point", size = 1) +
scale_colour_manual(values = palette_5_colors,
labels=qBr(Parking.test,"ParkingOccupancy.AbsError"),
name="Quintile\nBreaks") +
facet_grid(time_of_day~weekday)+
labs(title="Parking Occupancy MAE Across Time") +
mapTheme()
Moran’s I test is calculated to examine whether there is spatial autocorrelation in the error in different parking blockface. The Moran’s I value is close to zero which represents a spatial randomness. The observed Moran’s I is in orange but it is not higher than all the 999 randomly generated permutations. Thus, the Moran’s I indicates that the model includes some features that can represent the parking occupancy spatial structure although this part may still miss some factors.
Parking.test_centroid <-
Parking.test %>%
st_centroid()%>%
st_transform(3689)
which(is.na(Parking.test_centroid$Occupancy))
Parking.test_centroid <- Parking.test_centroid[!is.na(Parking.test_centroid$Occupancy), ]
which(is.finite(Parking.test_centroid$Occupancy))
Parking.test_centroid <- Parking.test_centroid[is.finite(Parking.test_centroid$Occupancy), ]
coords.test <-
filter(Parking.test_centroid, !is.na(ParkingOccupancy.Error)) %>%
st_coordinates()
neighborList.test <- knn2nb(knearneigh(coords.test, 15))
spatialWeights.test <- nb2listw(neighborList.test, style="W")
moranTest <- moran.mc(filter(Parking.test, !is.na(ParkingOccupancy.Error))$ParkingOccupancy.Error,
spatialWeights.test , nsim = 999)
ggplot(as.data.frame(moranTest$res[c(1:999)]), aes(moranTest$res[c(1:999)])) +
geom_histogram(binwidth = 0.01) +
geom_vline(aes(xintercept = moranTest$statistic), colour = "#FA7800",size=1) +
scale_x_continuous(limits = c(-1, 1)) +
labs(title="Observed and permuted Moran's I",
subtitle= "Observed Moran's I in red",
x="Moran's I",
y="Count") +
plotTheme()
The aim of the predictive regression model is to be generalizable and accurate, which means that the model should be distributed spatially. We use cross validation to separate data into two parts, with 70% training data and 30% test data.
The algorithm of cross-valiation is the “k-fold cross-validation (k-30)”. We set k as 30 due to the consideration of code running time. Therefore, the original dataset is randomly seperated by 30 equal sizes of subsamples. The Mean Absolute Error is around 18%, which means the accuracy is moderate. Additionally, the RSME indicates that there are some large errors in the model prediction since there is a big difference between RSME and the MAE. According to the RSME, the error is 25%.
#Cross Validation
fitControl <- trainControl(method = "cv", number = 30)
set.seed(825)
reg_training.cv <- train(Occupancy ~ ., data = as.data.frame(Parking_final) %>%
dplyr::select(-geometry, --SHAPE_Length),
method = "lm", trControl = fitControl, na.action = na.pass)
reg_training.cv
The histogram of the cross-validation MAE demonstrates that errors are mostly gathered around 18.5%. From the figure, it is shown that the variance of MAE is small, which indicating that the standard deviation value is small and the generalizable is fair.
#Cross Validation
fitControl <- trainControl(method = "cv", number = 30)
set.seed(825)
reg_training.cv <- train(Occupancy ~ ., data = as.data.frame(Parking_final) %>%
dplyr::select(-geometry, --SHAPE_Length),
method = "lm", trControl = fitControl, na.action = na.pass)
reg_training.cv
reg_training.cv$resample[1:10,]
sd(reg_training.cv$resample[,3])
#[1] 0.3648309
ggplot(as.data.frame(reg_training.cv$resample), aes(MAE)) +
geom_histogram(bins = 50, colour="white", fill = "#25CB10") +
labs(title="Distribution of MAE", subtitle = "k-fold cross validation; k = 30",
x="Mean Absolute Error", y="Count") +
plotTheme()
According to the following MAE by hour in different weekdays and weekend, the model has a similar generalizability among the same hour in different weekdays/weekend. For example, at 8:00 AM, the MAE are generally higher than 9:00 AM in different days. For Tuesday, somehow the MAE tends to be higher in every hour compared with other days. Overall, besides 8:00 AM, model’s predicting had a good generalizability.
In general, the model created for this project is effective with an R-square around 0.54 which indicates around 54% variations in the parking occupancy can be explained by this model. In addition, this model is generalizable in terms of different time period such as hours, peak hours and the day of a week. While the interesting independent variables in this model are public parking lots and garages, parking rate and land use, the most important variable to this model shown by the result is related to internal characteristics. Unfortunately, parking rate is not as important as we thought. This is possibly because people have too many reasons to choose whether to park in a place. When their parking demand is necessary, the rate is not a restriction.
In order to check the generalizability of the model, the 30-fold cross-validation gives back a MAE around 18.51259 which indicates the model is generalizable. The model could predict well but it has its limitations. Most of the limitation comes from the available parking data. The annual parking data only have records on certain days such as Tuesday, Wednesday, Thursday and some of the Saturday and Sunday. Each day’s record only contains one neighborhood which makes the prediction harder to be accurate.
Overall, this model could be a useful planning tool for transportation planners to adjust related factors in order to balance parking occupancy. Seattle Department of Transportation could consider to develop a dataset with more detailed data so that Parkemon can improve its predicting power for a better planning advising.